线性回归与逻辑回归

线性回归与逻辑回归

线性回归(boston房价预测)

(1)数据预处理
1
2
3
4
5
6
7
8
9
data = pd.read_csv("data.txt", delim_whitespace=True, names=['CRIM', 'ZN', 'INDUS','CHAS',' NOX','RM','AGE',' DIS',' RAD',' TAX','PTRATIO','B','LSTAT','MEDV'])

for i in range(len(data)):
if((i + 1) % 2 != 1):
data["B"][i - 1] = data["CRIM"][i]
data["LSTAT"][i - 1] = data["ZN"][i]
data["MEDV"][i - 1] = data["INDUS"][i]
data = data.drop([i], axis = 0)
data # 查看数据

(2)划分训练集与测试集

由于没有测试数据,我们将数据集划分为训练集与验证集

1
2
3
4
5
6
data_X = data[['ZN','RM','PTRATIO','LSTAT']] # 参数
data_y = data[['MEDV']] # 结果

X_train,X_test,y_train,y_test = train_test_split(data_X, data_y, test_size = 0.4)# 划分

X_train.shape,X_test.shape,y_train.shape,y_test.shape # 查看划分shape

(3)训练数据

划分的训练集训练模型

1
2
model=LinearRegression()
model.fit(X_train,y_train)
(4)计算预测值

划分的验证集使用模型预测

1
2
y_pred = model.predict(X_test)
y_pred
(5)计算平均绝对误差
1
2
mae = mean_absolute_error(y_pred,y_test)
mae
(6)数据分析
1
2
3
4
5
6
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (20,10))
plt.rcParams['font.size'] = 15
plt.plot(range(y_test.shape[0]),y_test, linewidth=2, linestyle='-')
plt.plot(range(y_test.shape[0]),y_pred,linewidth=2, linestyle='-.')
plt.legend(['Test','Predict'])

逻辑回归(iris分类)

(1)读入数据
1
2
data = pd.read_csv("G:/大数据/机器学习实验/实验一/iris/iris/iris.data",sep = ',',names = ['Sepal_Length','Sepal_width','Petal_length','Petal_width','Class'])
data

(2) 查看缺失值

运行结果可见无缺失

1
data.isnull().sum()

(3)划分数据集
1
2
3
4
5
# iloc第一个参数取行数,第二个参数取列数 #
X = data.iloc[:, :-1]
y = data.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)##数据集的划分
print(X)

(4)对数据标准化处理
1
2
3
4
5
6
7
# 标准化
stdsc = StandardScaler()#实例化
X_train_conti_std = stdsc.fit_transform(X_train[['Sepal_Length','Sepal_width','Petal_length','Petal_width']])#拟合
X_test_conti_std = stdsc.fit_transform(X_test[['Sepal_Length','Sepal_width','Petal_length','Petal_width']])#拟合
# 将ndarray转为dataframe
X_train_conti_std = pd.DataFrame(data=X_train_conti_std, columns=['Sepal_Length','Sepal_width','Petal_length','Petal_width'], index=X_train.index)
X_test_conti_std = pd.DataFrame(data=X_test_conti_std, columns=['Sepal_Length','Sepal_width','Petal_length','Petal_width'], index=X_test.index)
(5)逻辑回归建立模型
1
2
3
4
5
6
7
8
#基于训练集使用逻辑回归建模
classifier = LogisticRegression(random_state=0)#实例化算法
classifier.fit(X_train, y_train)#模型训练

# 将模型应用于测试集并查看混淆矩阵
y_pred = classifier.predict(X_test)#预测
confusion_matrix = confusion_matrix(y_test, y_pred)#打印混淆矩阵,是很多评分函数的标准来源
print(confusion_matrix)#打印混淆矩阵

(6)正确率
1
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(classifier.score(X_test, y_test)))

*编码处理

该处理适用一些模型,可用编码将变量数字化,本模型并不需要,以下仅展示使用效果。

1
2
3
4
5
# 分类变量编码
data_dummy = pd.get_dummies(data[['Class']])# 独热编码
data_conti = pd.DataFrame(data, columns=['Sepal_Length','Sepal_width','Petal_length','Petal_width'], index=data.index)
data = data_conti.join(data_dummy)#拼接
data

# python
Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×